I/O-Efficient Similarity Join
نویسندگان
چکیده
منابع مشابه
Efficient Privacy Preserving Protocols for Similarity Join
During the similarity join process, one or more sources may not allow sharing its data with other sources. In this case, a privacy preserving similarity join is required. We showed in our previous work [4] that using long attributes, such as paper abstracts, movie summaries, product descriptions, and user feedbacks, could improve the similarity join accuracy using supervised learning. However, ...
متن کاملAn Efficient Similarity Join Algorithm with Cosine Similarity Predicate
Given a large collection of objects, finding all pairs of similar objects, namely similarity join, is widely used to solve various problems in many application domains.Computation time of similarity join is critical issue, since similarity join requires computing similarity values for all possible pairs of objects. Several existing algorithms adopt prefix filtering to avoid unnecessary similari...
متن کاملAn Efficient Parallel Algorithms for High Dimensional Similarity Join
Multidimensional similarity join finds pairs of multidimensional points that are within some small distance of each other. The -k-d-B tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model of the -k-d-B tree and use it to optimize the leaf
متن کاملEfficient string similarity join in multi-core and distributed systems
In big data area a significant challenge about string similarity join is to find all similar pairs more efficiently. In this paper, we propose a parallel processing framework for efficient string similarity join. First, the input is split into some disjoint small subsets according to the joint frequency distribution and the interval distribution of strings. Then the filter-verification strategy...
متن کاملEfficient Similarity Join of Large Sets of Spatio-temporal Trajectories
We address the problem of performing efficient similarity join for large sets of moving objects trajectories. Unlike previous approaches which use a dedicated index in a transformed space, our premise is that in many applications of location-based services, the trajectories are already indexed in their native space, in order to facilitate the processing of common spatio-temporal queries, e.g., ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Algorithmica
سال: 2017
ISSN: 0178-4617,1432-0541
DOI: 10.1007/s00453-017-0285-5